Data Visualization Project 02

Introduction

NBA Teams
NBA Teams

The National Basketball Association (NBA) is one of the premier professional basketball leagues in the world. Analyzing team performance and statistics can provide insights into game strategies, player effectiveness, and overall team success. This report aims to explore various facets of NBA teams’ performance, focusing on key metrics and providing an in-depth analysis of team statistics.

Background

The NBA consists of 30 teams divided into two conferences: the Eastern Conference and the Western Conference. Each team plays an 82-game regular season, with the top eight teams from each conference advancing to the playoffs. Winning an NBA championship is the ultimate goal for any team (We are Champion - Celtics ☘️), requiring not just skill but also effective strategies and consistent performance.

Methodology

This analysis utilizes data from the NBA champions dataset, which includes various statistics for NBA teams. We will use R for data manipulation, visualization, and statistical analysis. The key areas of focus will include per game statistics, advanced statistics, and shooting statistics, providing a comprehensive overview of team performance.

# load libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Warning: package 'sf' was built under R version 4.3.2
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(htmlwidgets)
library(broom)
# load data
file_path <- "https://raw.githubusercontent.com/reisanar/datasets/master/NBAchampionsdata.csv"
data <- read_csv(file_path)
## Rows: 220 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): Team
## dbl (23): Year, Game, Win, Home, MP, FG, FGA, FGP, TP, TPA, TPP, FT, FTA, FT...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data
## # A tibble: 220 × 24
##     Year Team     Game   Win  Home    MP    FG   FGA   FGP    TP   TPA    TPP
##    <dbl> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1  1980 Lakers      1     1     1   240    48    89 0.539     0     0 NA    
##  2  1980 Lakers      2     0     1   240    48    95 0.505     0     1  0    
##  3  1980 Lakers      3     1     0   240    44    92 0.478     0     1  0    
##  4  1980 Lakers      4     0     0   240    44    93 0.473     0     0 NA    
##  5  1980 Lakers      5     1     1   240    41    91 0.451     0     0 NA    
##  6  1980 Lakers      6     1     0   240    45    92 0.489     0     2  0    
##  7  1981 Celtics     1     1     1   240    41    95 0.432     0     1  0    
##  8  1981 Celtics     2     0     1   240    41    82 0.5       0     3  0    
##  9  1981 Celtics     3     1     0   240    40    89 0.449     2     3  0.667
## 10  1981 Celtics     4     0     0   240    35    74 0.473     0     3  0    
## # ℹ 210 more rows
## # ℹ 12 more variables: FT <dbl>, FTA <dbl>, FTP <dbl>, ORB <dbl>, DRB <dbl>,
## #   TRB <dbl>, AST <dbl>, STL <dbl>, BLK <dbl>, TOV <dbl>, PF <dbl>, PTS <dbl>

Check the summary statistics and structure of the dataset, and look for missing values:

# Summary statistics 
summary(data)
##       Year          Team                Game          Win        
##  Min.   :1980   Length:220         Min.   :1.0   Min.   :0.0000  
##  1st Qu.:1989   Class :character   1st Qu.:2.0   1st Qu.:0.0000  
##  Median :1999   Mode  :character   Median :3.0   Median :1.0000  
##  Mean   :1999                      Mean   :3.4   Mean   :0.7091  
##  3rd Qu.:2009                      3rd Qu.:5.0   3rd Qu.:1.0000  
##  Max.   :2018                      Max.   :7.0   Max.   :1.0000  
##                                                                  
##       Home              MP              FG             FGA        
##  Min.   :0.0000   Min.   :240.0   Min.   :25.00   Min.   : 62.00  
##  1st Qu.:0.0000   1st Qu.:240.0   1st Qu.:33.00   1st Qu.: 75.00  
##  Median :1.0000   Median :240.0   Median :37.00   Median : 80.00  
##  Mean   :0.5045   Mean   :242.4   Mean   :37.75   Mean   : 80.88  
##  3rd Qu.:1.0000   3rd Qu.:240.0   3rd Qu.:42.00   3rd Qu.: 87.00  
##  Max.   :1.0000   Max.   :315.0   Max.   :56.00   Max.   :130.00  
##                                                                   
##       FGP               TP              TPA             TPP        
##  Min.   :0.2890   Min.   : 0.000   Min.   : 0.00   Min.   :0.0000  
##  1st Qu.:0.4298   1st Qu.: 2.000   1st Qu.: 6.75   1st Qu.:0.2500  
##  Median :0.4670   Median : 5.000   Median :15.00   Median :0.3585  
##  Mean   :0.4665   Mean   : 5.355   Mean   :14.60   Mean   :0.3422  
##  3rd Qu.:0.5000   3rd Qu.: 8.000   3rd Qu.:20.00   3rd Qu.:0.4440  
##  Max.   :0.6170   Max.   :18.000   Max.   :43.00   Max.   :1.0000  
##                                                    NA's   :6       
##        FT             FTA             FTP              ORB      
##  Min.   : 5.00   Min.   : 8.00   Min.   :0.3680   Min.   : 3.0  
##  1st Qu.:15.00   1st Qu.:21.00   1st Qu.:0.6670   1st Qu.: 9.0  
##  Median :19.00   Median :26.00   Median :0.7400   Median :12.0  
##  Mean   :19.93   Mean   :27.13   Mean   :0.7356   Mean   :12.3  
##  3rd Qu.:24.00   3rd Qu.:32.25   3rd Qu.:0.8157   3rd Qu.:15.0  
##  Max.   :43.00   Max.   :57.00   Max.   :1.0000   Max.   :27.0  
##                                                                 
##       DRB             TRB            AST            STL        
##  Min.   :16.00   Min.   :22.0   Min.   :11.0   Min.   : 1.000  
##  1st Qu.:27.00   1st Qu.:38.0   1st Qu.:18.0   1st Qu.: 6.000  
##  Median :30.00   Median :42.0   Median :22.0   Median : 8.000  
##  Mean   :30.20   Mean   :42.5   Mean   :22.5   Mean   : 7.855  
##  3rd Qu.:33.25   3rd Qu.:47.0   3rd Qu.:27.0   3rd Qu.:10.000  
##  Max.   :44.00   Max.   :59.0   Max.   :44.0   Max.   :18.000  
##                                                                
##       BLK              TOV              PF             PTS        
##  Min.   : 0.000   Min.   : 4.00   Min.   :12.00   Min.   : 71.00  
##  1st Qu.: 3.000   1st Qu.:11.00   1st Qu.:20.00   1st Qu.: 90.75  
##  Median : 5.000   Median :14.00   Median :23.00   Median :101.00  
##  Mean   : 5.323   Mean   :13.71   Mean   :22.86   Mean   :100.79  
##  3rd Qu.: 7.000   3rd Qu.:16.00   3rd Qu.:26.00   3rd Qu.:109.00  
##  Max.   :14.000   Max.   :26.00   Max.   :33.00   Max.   :141.00  
## 
# Structure the dataset
str(data)
## spc_tbl_ [220 × 24] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Year: num [1:220] 1980 1980 1980 1980 1980 ...
##  $ Team: chr [1:220] "Lakers" "Lakers" "Lakers" "Lakers" ...
##  $ Game: num [1:220] 1 2 3 4 5 6 1 2 3 4 ...
##  $ Win : num [1:220] 1 0 1 0 1 1 1 0 1 0 ...
##  $ Home: num [1:220] 1 1 0 0 1 0 1 1 0 0 ...
##  $ MP  : num [1:220] 240 240 240 240 240 240 240 240 240 240 ...
##  $ FG  : num [1:220] 48 48 44 44 41 45 41 41 40 35 ...
##  $ FGA : num [1:220] 89 95 92 93 91 92 95 82 89 74 ...
##  $ FGP : num [1:220] 0.539 0.505 0.478 0.473 0.451 0.489 0.432 0.5 0.449 0.473 ...
##  $ TP  : num [1:220] 0 0 0 0 0 0 0 0 2 0 ...
##  $ TPA : num [1:220] 0 1 1 0 0 2 1 3 3 3 ...
##  $ TPP : num [1:220] NA 0 0 NA NA 0 0 0 0.667 0 ...
##  $ FT  : num [1:220] 13 8 23 14 26 33 16 8 12 16 ...
##  $ FTA : num [1:220] 15 12 30 19 33 35 20 13 19 24 ...
##  $ FTP : num [1:220] 0.867 0.667 0.767 0.737 0.788 0.943 0.8 0.615 0.632 0.667 ...
##  $ ORB : num [1:220] 12 15 22 18 19 17 25 14 16 17 ...
##  $ DRB : num [1:220] 31 37 34 31 37 35 29 34 28 30 ...
##  $ TRB : num [1:220] 43 52 56 49 56 52 54 48 44 47 ...
##  $ AST : num [1:220] 30 32 20 23 28 27 23 17 24 22 ...
##  $ STL : num [1:220] 5 12 5 12 7 14 6 6 12 5 ...
##  $ BLK : num [1:220] 9 7 5 6 6 4 5 7 6 6 ...
##  $ TOV : num [1:220] 17 26 20 19 21 17 19 22 11 22 ...
##  $ PF  : num [1:220] 24 27 25 22 27 22 21 27 25 22 ...
##  $ PTS : num [1:220] 109 104 111 102 108 123 98 90 94 86 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Year = col_double(),
##   ..   Team = col_character(),
##   ..   Game = col_double(),
##   ..   Win = col_double(),
##   ..   Home = col_double(),
##   ..   MP = col_double(),
##   ..   FG = col_double(),
##   ..   FGA = col_double(),
##   ..   FGP = col_double(),
##   ..   TP = col_double(),
##   ..   TPA = col_double(),
##   ..   TPP = col_double(),
##   ..   FT = col_double(),
##   ..   FTA = col_double(),
##   ..   FTP = col_double(),
##   ..   ORB = col_double(),
##   ..   DRB = col_double(),
##   ..   TRB = col_double(),
##   ..   AST = col_double(),
##   ..   STL = col_double(),
##   ..   BLK = col_double(),
##   ..   TOV = col_double(),
##   ..   PF = col_double(),
##   ..   PTS = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Check for missing values
colSums(is.na(data))
## Year Team Game  Win Home   MP   FG  FGA  FGP   TP  TPA  TPP   FT  FTA  FTP  ORB 
##    0    0    0    0    0    0    0    0    0    0    0    6    0    0    0    0 
##  DRB  TRB  AST  STL  BLK  TOV   PF  PTS 
##    0    0    0    0    0    0    0    0

Celtics Home and Away Wins

To analyze the home and away wins for the Boston Celtics:

# make columns numeric
data <- data %>%
  mutate(across(c(Win, Home, DRB), as.numeric))

# Filter data for Celtics
celtics_data <- data %>% filter(Team == "Celtics")

# Summarize home and away wins
celtics_home_away <- celtics_data %>%
  summarise(HomeWins = sum(ifelse(Home == 1 & Win == 1, 1, 0), na.rm = TRUE),
            AwayWins = sum(ifelse(Home == 0 & Win == 1, 1, 0), na.rm = TRUE))

celtics_home_away
## # A tibble: 1 × 2
##   HomeWins AwayWins
##      <dbl>    <dbl>
## 1       11        5

Analysis 1: Interactive Plot

# convert columns to numeric
data <- data %>%
  mutate(across(c(Win, Home, DRB, PTS), as.numeric))

# Filter data for Celtics - favorite basketball team
celtics_data <- data %>% filter(Team == "Celtics")

# Getting sum of total points by Celtics in home and away games
celtics_points <- celtics_data %>%
  group_by(Home) %>%
  summarise(TotalPoints = sum(PTS, na.rm = TRUE))

# Create interactive plot
p <- celtics_points %>%
  ggplot(aes(x = factor(Home, labels = c("Away", "Home")), y = TotalPoints, fill = factor(Home))) +
  geom_bar(stat = "identity") +
  labs(
    title = "Total Points by Celtics in Home and Away Games",
    subtitle = "Total points scored by the Celtics in home and away games.",
    x = "Game Location",
    y = "Total Points"
  ) +
  scale_fill_manual(values = c("Away" = "blue", "Home" = "red")) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    legend.title = element_blank()
  )

# Add interactive elements with plotly
interactive_plot <- ggplotly(p, tooltip = c("x", "y"))

# Customize hover text
interactive_plot <- interactive_plot %>%
  layout(
    hoverlabel = list(
      bgcolor = "white",
      bordercolor = "black",
      font = list(size = 12)
    )
  )

# show plot
interactive_plot
# Save hrml plot
saveWidget(interactive_plot, file = "celtics_points_interactive.html")

Analysis 2: Spatial Visualization

# looking into specific team by locations and team colors
team_locations <- data.frame(
  Team = c("Lakers", "Celtics", "Sixers"),
  City = c("Los Angeles", "Boston", "Philadelphia"),
  Latitude = c(34.0522, 42.3601, 39.9526),
  Longitude = c(-118.2437, -71.0589, -75.1652),
  Color = c("yellow", "green", "red")
)

# Convert to spatial 
team_locations <- st_as_sf(team_locations, coords = c("Longitude", "Latitude"), crs = 4326)

# Get US map data
us_map <- map_data("state")

# plot
spatial_plot <- ggplot() +
  geom_polygon(data = us_map, aes(x = long, y = lat, group = group), fill = "gray95", color = "black", linewidth = 0.2) +
  geom_sf(data = team_locations, aes(geometry = geometry, fill = Team), size = 5, shape = 21, color = "black") +
  scale_fill_manual(values = c("Lakers" = "yellow", "Celtics" = "green", "Sixers" = "red")) +
  labs(title = "NBA Team Locations", x = "Longitude", y = "Latitude") +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.title = element_blank(),
    plot.title = element_text(hjust = 0.5)
  )

spatial_plot

# Save plot
ggsave("nba_team_locations.png", plot = spatial_plot, width = 10, height = 6)

Analysis 3: Visualization of a Model

In this section, we will build a linear model to explore the relationship between defensive rebounds (DRB) and points scored (PTS) by NBA teams. Understanding this relationship can help teams improve their defensive strategies to increase their chances of scoring and winning games.

# Ensure columns are numeric
data <- data %>%
  mutate(across(c(Win, Home, DRB, PTS), as.numeric))

# Linear Model and Coefficients Plot
model <- lm(PTS ~ DRB, data = data)
model_summary <- summary(model)

# Get the coefficients and their confidence intervals
coef_data <- broom::tidy(model) %>%
  mutate(term = recode(term, `(Intercept)` = "Intercept", `DRB` = "Defensive Rebounds"))

# Create interactive plot with hover annotations
coef_plot <- coef_data %>%
  ggplot(aes(x = term, y = estimate, fill = term, text = paste(
    "Term: ", term, "<br>",
    "Estimate: ", round(estimate, 2), "<br>",
    "Std Error: ", round(std.error, 2), "<br>",
    "This plot shows the relationship between defensive rebounds and points scored.<br>",
    "The bars represent the estimated coefficients, and the lines represent the uncertainty around these estimates."
  ))) +
  geom_col(width = 0.6) +
  geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.2) +
  scale_fill_manual(values = c("Defensive Rebounds" = "#1f77b4", "Intercept" = "#ff7f0e")) +
  labs(
    title = "Impact of Defensive Rebounds on Points Scored",
    subtitle = "Linear Model Coefficients with Error Bars",
    x = "Model Terms",
    y = "Coefficient Estimate"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5)
  )

interactive_plot <- ggplotly(coef_plot, tooltip = "text") %>%
  layout(margin = list(t = 80))  # Adjust top margin

# Residuals Plot
residuals_data <- augment(model)
residuals_plot <- residuals_data %>%
  ggplot(aes(x = .fitted, y = .resid, text = paste(
    "Fitted Value: ", round(.fitted, 2), "<br>",
    "Residual: ", round(.resid, 2)
  ))) +
  geom_point(color = "#1f77b4") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals vs Fitted Values",
    x = "Fitted Values",
   

 y = "Residuals"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

interactive_residuals_plot <- ggplotly(residuals_plot, tooltip = "text") %>%
  layout(margin = list(t = 80))  # Adjust top margin

# Save interactive plots in HTML format
saveWidget(interactive_plot, file = "model_coefficients_interactive.html")
saveWidget(interactive_residuals_plot, file = "residuals_plot_interactive.html")

# Show plots
interactive_plot
interactive_residuals_plot
# Display model summary
model_summary
## 
## Call:
## lm(formula = PTS ~ DRB, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.552 -10.032  -0.201   8.911  42.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  87.9413     5.5924  15.725   <2e-16 ***
## DRB           0.4253     0.1828   2.326   0.0209 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.18 on 218 degrees of freedom
## Multiple R-squared:  0.02423,    Adjusted R-squared:  0.01975 
## F-statistic: 5.412 on 1 and 218 DF,  p-value: 0.02091

Motivation for the Model: The motivation behind predicting points scored (PTS) based on defensive rebounds (DRB) lies in understanding how defensive actions contribute to offensive success. By analyzing this relationship, teams can focus on improving defensive strategies to enhance their scoring opportunities, ultimately leading to more wins. The model helps confirm that better defensive rebounding can lead to increased points, validating the importance of defensive efforts in overall team performance.

“Key” Strategies

Based on the analysis, several key strategies can be identified for NBA teams aiming for success:

  1. Effective Defensive Rebounding: The analysis indicates a significant relationship between defensive rebounds and points scored. Teams should focus on improving their defensive rebounding skills to create more scoring opportunities.

  2. Consistent Performance at Home and Away: The Celtics’ performance analysis shows the importance of maintaining consistency in both home and away games. Teams should develop strategies to perform well regardless of the location.

  3. Utilizing Advanced Statistics: Advanced metrics like the ones used in this report can provide deeper insights into team performance. Teams should incorporate such analyses into their strategy development.

Conclusion

This comprehensive analysis of NBA team statistics provides valuable insights into the factors contributing to team success. By focusing on key metrics such as defensive rebounds and performance consistency, teams can refine their strategies and improve their chances of winning championships. The use of interactive and spatial visualizations further enhances the understanding of these metrics, making the analysis accessible and engaging for both analysts and fans.